This notebook concerns the analysis of data collected from the Y Plantroom from the Belimo Sensors installed in the period of November 2024 - Febuary 2025. From plotting some basic time series and characteristic graphs of the data, I noticed that the Constant Temperature (CT) systems are negligble in comparison to the Variable Temperature (VT) systems, so the remainder of the notebook focuses on modelling the load in the Cripps and Fisher VT systems using the availble data from the Building Managment System (BMS) - Priva.
Constant and Variable Temperature Systems
A constant temperature system is one in which the temperature of the water circulating the system does not change with outside factors, e.g. outside temperature. These systems are used for providing domestic hot water (DHW) as you want the same temperature hot water in all conditions.
A variable temeprature system is one in which the temperature of the water cicualting the system does change with outside factors, e.g. outside temperature. These systems are used for space heating, as different amounts of heat will be required to heat a room to a constant temperature depending on external factors.
Data Sources
We have data from Belimo sensors in the following locations:
Fisher VT (1.1)
Fisher CT (1.3)
Cripps VT (1.2)
Cripps CT (1.4)
MCC1 (1.5) - Not in the scope of this analysis
Each Belimo Sensor Measures: Flow Temperature, Return Temperature, Flow Rate, Flow Velocity in 30 second intervals and uses these metrics to calculate the Heating Load on each system $ ( P = m c T )$. Each group of sensors also has one outside temperature sensor.
The data from these Belimo Sensors was then supplmented by the following BMS readings from Priva:
Fisher VT Flow Temperature
Cripps VT Flow Temperature
Fisher Boiler Room Outside Temperature
The BMS measures the above data points in 8 minute intervals.
Code
# Install requirementsimport pandas as pdimport plotly.offline as pyimport plotly.graph_objects as gofrom plotly.subplots import make_subplotsimport numpy as npimport matplotlib.pyplot as pltfrom scipy.stats import linregressfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import root_mean_squared_error, r2_score, mean_absolute_percentage_errorfrom sklearn.preprocessing import StandardScalerimport plotly.io as piopio.renderers.default ="notebook_connected"
Code
# Fetch the Cleaned Belimo Data (run Clean_Belimo_Data.py to produce the file)# Create Key for 1_1, 1_2, ect codeslegend_lookup = {"1_1" : "Fisher VT","1_2" : "Cripps VT","1_3" : "Cripps New CT","1_4" : "Cripps Old CT",}data = pd.read_parquet("BelimoData/Combined.parquet")data_12hr = data.resample("12h").mean()
Analysis of Load By Heating System
Code
fig1 = go.Figure()for i in data_12hr.columns:if i !="1_5"and i!="Total P"and i.split(" ")[-1] !="Outside"and i.split(" ")[-1] !="Flow": fig1.add_trace(go.Scatter( name = legend_lookup[i], x = data_12hr.index, y = data_12hr[i], stackgroup='one' ))fig1.add_trace(go.Scatter( name ="Total Load", x = data_12hr.index, y = data_12hr["Total P"], ))fig1.update_layout( title=dict( text="Y-Plantroom Load by Heating System" ), xaxis=dict( title=dict( text="Timestamp" ) ), yaxis=dict( title=dict( text="Load - 12 hour Average (W)" ) ), legend=dict( title=dict( text="Schematic Position" ) ))fig1.show()
The above graph shows how the total load of the Y-plantroom, averaged over 12 hour intervals, varies with time and how it breaks down into individual heating systems.
Key observation from this graph include:
Cripps New CT’s load is on the order of about ~80 kW peaking at ~ 186 kW, this is signficiantly less any other of the systems, being ~2x less than the VT systems at its nominal value
Cripps Old CT’s load is on the order of about ~30 kW peaking at ~47 kW, about ~10x less than either VT system
Fisher VT’s load is on the order of about ~150 kW peaking at ~242 kW
Cripps VT’s load is on the order of about ~130 kW peaking at ~147 kW
THus we can see that the CT systems hold a smaller share of the load (~ 120 kW) compared to the (~ 389 kW) for the VT systems, thus if we are looking to make the most significant optimisation opportuntites we should start with them.
It is also important to note that the plantroom has a 2 MW capacity but this chart shows our total load as never surpassing 500 kW so the boilers are ~4x oversized.
This is the reason the majority of this notebook focuses on them.
Load-Temperature Characteristic By Heating System
Code
data_6hr = data.resample("6h").mean()fig2 = go.Figure()for i in data.columns:if i !="1_5"and i.split(" ")[-1] !="Outside"and i.split(" ")[-1] !="Flow"and i !="Total P": fig2.add_trace(go.Scatter( name = legend_lookup[i], x = data_6hr[i +" T Outside"], y = data_6hr[i], mode ="markers" ))fig2.update_layout( title=dict( text="Temperature-Load Characterists by System" ), xaxis=dict( title=dict( text="Outside Temperature (°C)" ) ), yaxis=dict( title=dict( text="Load - 6 hour Average (W)" ) ), legend=dict( title=dict( text="Schematic Position" ) ))fig2.show()
The above plot shows how the 6 hour average load varies with the outside temperature.
We can clearly see the following from this plot: - Both VT systems show a negative correlation between load and outside temperature - Both CT systems show no correlation between load and outside temperature
This is what we would expect as increasing outside temperature will result in the ambient space temperature increasing and thus the heating system will have to do less work to maintain a consistant temperature, thus resulting in a lower load.
Since there are negligble losses, the CT system will have no dependance on outside temperature, as the amount of heat required, and thus the load, is a function of demand only.
Heating Curve
Code
data_hourly = data.resample("1h").mean()fig12 = go.Figure()fig12.add_trace(go.Scatter( name ="Cripps VT", x = data_hourly["1_2"+" T Outside"], y = data_hourly["1_2"+" T Flow"], mode ="markers" ))fig12.add_trace(go.Scatter( name ="Fisher VT", x = data_hourly["1_1"+" T Outside"], y = data_hourly["1_1"+" T Flow"], mode ="markers" ))fig12.update_layout( title=dict( text="Heating Curves for VT Systems (Hourly Average)" ), xaxis=dict( title=dict( text="Outside Temperature (°C)" ) ), yaxis=dict( title=dict( text="Flow Temperature (°C)" ) ), legend=dict( title=dict( text="Series" ) ))fig12.show()
A Heating Curve is a graph which describes the VT supply temperature as a function of the ambient temperature.
image.png
For each system you should expect to have 2 heating curves that look similar to the picture above. One heating curve would be for during the day and the other for during the night. The heating curve should plataue at high temperature as the temperature at which the boiler can heat water to has an upper limit.
These 2 heating curves are visisble in the Fisher VT data, as shown by the parralell lines. However in the cripps there is much more noise, with one line faintly visible through the middle. This line corresponds to the moments at which the load is controlled by the weather conditions. There are 2 fixed lines at 30°C and 65°C, these must correspond to when the BMS has taken control and altered the load manually. In these regions the system is operating as a CT system rarther than a VT system, which is in theory less energy efficient and so is worth some investigation.
fig3 = make_subplots(specs=[[{"secondary_y": True}]])fig3.add_trace( go.Scatter(x=data_sample.index, y=data_sample["Fisher VT Flow-temperature"], name="Fisher BMS"), secondary_y=False)fig3.add_trace( go.Scatter(x=data_sample.index, y=data_sample["1_1 T Flow"], name="Fisher Belimo"), secondary_y=False)fig3.add_trace( go.Scatter(x=data_sample.index, y=data_sample["Cripps VT Flow-temperature"], name="Cripps BMS"), secondary_y=False)fig3.add_trace( go.Scatter(x=data_sample.index, y=data_sample["1_2 T Flow"], name="Cripps Belimo"), secondary_y=False)fig3.update_layout( title=dict( text="Belimo and BMS Flow Temperature Sensors plotted against time" ), xaxis=dict( title=dict( text="Timestamp" ) ), yaxis=dict( title=dict( text="Flow Temperature (°C)" ) ), legend=dict( title=dict( text="Series" ) ))fig3.show()
The above time series plot compares the Belimo Flow temperature results with the BMS results. We can see that in oth cases the curves follow each other nicely, having the same peaks and troughs at the same times. The belimo data does deviate slightly from the BMS data, holding a lower value at all times.
I would guess this is due to differences in sensor placement but requires further investigation.
x = np.arange(30,73, 2)fig4 = go.Figure()fig4.add_trace(go.Scatter( name ="Fisher VT", x = data_sample["Fisher VT Flow-temperature"], y = data_sample["1_1 T Flow"], mode ="markers", line=dict(color="#00CC96") ))fig4.add_trace(go.Scatter( name ="Fisher VT", x = x, y = fisher.slope * x + fisher.intercept, mode ="lines", line=dict(color="#00CC96") ))fig4.add_trace(go.Scatter( name ="Cripps VT", x = data_sample["Cripps VT Flow-temperature"], y = data_sample["1_2 T Flow"], mode ="markers", line=dict(color="#AB63FA") ))fig4.add_trace(go.Scatter( name ="Cripps VT", x = x, y = cripps.slope * x + cripps.intercept, mode ="lines", line=dict(color="#AB63FA") ))fig4.update_layout( title=dict( text="Y Plantroom Belimo vs BMS Flow Temperature" ), xaxis=dict( title=dict( text="BMS Flow Temperature (°C)" ) ), yaxis=dict( title=dict( text="Belimo Reading (°C)" ) ), legend=dict( title=dict( text="Series" ) ))fig4.show()
The above graph and regression summaries show a clear linear realtionship between the belimo reading and flow temperature. There are some anomolous values which distort the regression resulting in the lines deviting from the y=x curve we would expect.
I remove these anomlies to produce the following regressions:
Code
# Remove anamolous Values and repeat regressiondata_8min1 = data_8min.loc[data_8min["1_1 T Flow"] !=65.5]data_8min1 = data_8min1.loc[data_8min1["1_2 T Flow"] !=29.67]data_8min1 = data_8min1.loc[data_8min1["1_1 T Flow"] !=66.28]data_8min1 = data_8min1.loc[data_8min1["1_2 T Flow"] !=29.67]data_sample1 = data_8min.resample("12h").mean()cripps = linregress(x = data_8min1["Cripps VT Flow-temperature"],y = data_8min1["1_2 T Flow"])fisher = linregress(x = data_8min1["Fisher VT Flow-temperature"],y = data_8min1["1_1 T Flow"])print("Improved Regressions:")print(f"""---Cripps VT regression:Gradient = {cripps.slope}Intercept = {cripps.intercept}R^2 Value = {cripps.rvalue **2}""")print(f"""---Fisher VT regression:Gradient = {fisher.slope}Intercept = {fisher.intercept}R^2 Value = {fisher.rvalue **2}""")
x = np.arange(30,73, 2)fig5 = go.Figure()fig5.add_trace(go.Scatter( name ="Fisher VT", x = data_sample1["Fisher VT Flow-temperature"], y = data_sample1["1_1 T Flow"], mode ="markers", line=dict(color="#00CC96") ))fig5.add_trace(go.Scatter( name ="Fisher VT", x = x, y = fisher.slope * x + fisher.intercept, mode ="lines", line=dict(color="#00CC96") ))fig5.add_trace(go.Scatter( name ="Cripps VT", x = data_sample1["Cripps VT Flow-temperature"], y = data_sample1["1_2 T Flow"], mode ="markers", line=dict(color="#AB63FA") ))fig5.add_trace(go.Scatter( name ="Cripps VT", x = x, y = cripps.slope * x + cripps.intercept, mode ="lines", line=dict(color="#AB63FA") ))fig5.update_layout( title=dict( text="Y Plantroom Belimo vs BMS sensors" ), xaxis=dict( title=dict( text="BMS Flow Temperature (°C)" ) ), yaxis=dict( title=dict( text="Belimo Reading (°C)" ) ), legend=dict( title=dict( text="Series" ) ))fig5.show()
We can see that removing these anomolous values results in a much tighter fit to to the data, with greater \(R^2\) values and gradients closer to the 1 we expect. Both still have larger deviations from 1 and 0 in the gradients and intercepts, which I belive is a followthrough of the error we obersved in the time series.
This deviation requires further investigation as to why it is present.
The strong correlation between the belimo and BMS flow readings indicate that there is consistancy between the 2 data sources and thus usins BMS data to model the load as measured by Belimo is appropriate.
Creating a Linear Model for Load
This section follows the production of a linear model for the steady state load in terms of BMS Flow Temperature and Outside Temperature readings. I am using the load calculated from Belimo sensors to train the model, in concordance with this BMS data.
The model I have applied is a basic linear regression. A linear regression using an Ordinary Least Squares (OLS) method.
An Aside on Linear Regressions
A linear model takes the form of:
\[
y = m_1x_1 + m_2x_2 + ... + m_ix_i + c + \varepsilon
\]
Where \(x_i\) is an input variable, \(m_i\) is its respective coefficient, \(c\) is a static offset/intercept, \(\varepsilon\) is the error, and \(y\) is the output.
An OLS regression works by taking an input of a dataset of known values of \(\textbf{x}\) and their respective \(y\). It then uses a variety of mathmatical techniques (explored in this wikipedia article) to find the values of \(c\) and \(M\) which minimise the square of the residuals (difference between observed and predicted values). We square the residual to ensure that the positive and negative errors do not cancel one another out, additionaly squaring errors gives more weight to larger errors, resulting in the model reducing bigger mistakes.
In the diagram below the white lines are the residuals, the red points are the training dataset and the black line is the model. The best model is when the length of all the white lines is minimised.
image.png
More information into OLS models and applying them in python can be found here
Quantifing model performance
To quantify how good our model is I will use the following metrics: - \(R^2\) or the coefficient of determination. It is the proportion of variation in the dependent variable that is predictable from the independent variables - i.e. how well the model responds to nuance in the data. 1 indicates a perfect fit and 0 indicates that no variablity in the dependant varaible is explained by the model. - RMSE or Root mean squared error is an indication of the average residual between the model and the real value. It gives us an indication of the average absolute uncertainty in the model. Since we are squaring all of the residuals this metric is sensitive to large outliers in the data. - MAPE or Mean Absoulte Percentage Error is another indication of the average percentage uncertainty in the model. Instead of squaring the residuals this function takes the absoulte value and so is less sensitive to outliers.
More information about these metrics can be found here A usefule guid on interpreting \(R^2\) can be found here
Code
fig6 = go.Figure()fig6.add_trace(go.Scatter( name ="Fisher VT", x = data_sample["Fisher VT Flow-temperature"], y = data_sample["1_1"], mode ="markers", line=dict(color="#00CC96") ))fig6.add_trace(go.Scatter( name ="Cripps VT", x = data_sample["Cripps VT Flow-temperature"], y = data_sample["1_2"], mode ="markers", line=dict(color="#AB63FA") ))fig6.update_layout( title=dict( text="Belimo Load vs BMS Flow Tempreature Sensors" ), xaxis=dict( title=dict( text="BMS Flow Temperature (°C)" ) ), yaxis=dict( title=dict( text="Load from Belimo (W)" ) ), legend=dict( title=dict( text="Series" ) ))fig6.show()
Code
fig7 = go.Figure()fig7.add_trace(go.Scatter( name ="Fisher VT", x = data_sample["Outside-temperature"], y = data_sample["1_1"], mode ="markers", line=dict(color="#00CC96") ))fig7.add_trace(go.Scatter( name ="Cripps VT", x = data_sample["Outside-temperature"], y = data_sample["1_2"], mode ="markers", line=dict(color="#AB63FA") ))fig7.update_layout( title=dict( text="Belimo Load vs BMS Outdoor Sensor" ), xaxis=dict( title=dict( text="BMS Outside Temperature (°C)" ) ), yaxis=dict( title=dict( text="Load from Belimo (W)" ) ), legend=dict( title=dict( text="Series" ) ))fig7.show()
The above graphs and correlation coefficients show that there is a pretty strong positive correlation between the flow temperature and load, with correlation coeefcients of ~0.7. The correaltion between outside temperature and load much weaker but still present with smaller magnitude negative correlation coefficients.
I believe that there is enough of a correlation between these 2 variables and load to perform a linear regressionuse the former to predict the latter, thus these are the 2 variables I will choose to use.
Code
fig8 = go.Figure()fig8.add_trace(go.Scatter3d( name ="Fisher VT", x = data_8min["Outside-temperature"], z = data_8min["1_1"], y = data_8min["Fisher VT Flow-temperature"], mode ="markers", line=dict(color="#00CC96"), marker_size=1 ))fig8.add_trace(go.Scatter3d( name ="Cripps VT", x = data_8min["Outside-temperature"], z = data_8min["1_2"], y = data_8min["Cripps VT Flow-temperature"], mode ="markers", line=dict(color="#AB63FA"), marker_size=1 ))fig8.update_layout( title=dict( text="Load vs Outdoor Temperature vs Flow Temperature" ), scene =dict( xaxis=dict( title=dict( text="Outside Temperature (°C)" ) ), yaxis=dict( title=dict( text="Flow temperature (°C)" ) ), zaxis=dict( title=dict( text="Load (W)" ) )), legend=dict( title=dict( text="Series" ) ))fig8.show()
From the above plot we can see a trend between the 3 variables, looking at the graph from the top down we can clearly see the heating curves used to define the systems. There is, however, a lot of variation in the data, this is because the boiler cycles on and off in daily or bidaily cycles. During the cycling on and off, we end up with peaks as the boiler start up, steady state regions of high ~constant load, low power regions and 0 power (off) regions. These are annotated on the graph below:
image.png
For the scope of this document we are only interested in modelling the steady-state power, and since the other regions occur at regularly repeating time intervals we can easily remove them from the dataset before we train the model to improve results.
RMSE: 18895.998508006895 W
R-squared: 0.6324670244573225
MAPE: 0.07985614765220582
Code
fig9 = go.Figure()fig9.add_trace(go.Scatter3d( name ="Fisher VT Model", x =list(zip(*XF_test))[0], z = yF_pred, y =list(zip(*XF_test))[1], mode ="markers", line=dict(color="#AB63FA"), marker_size=1 ))fig9.add_trace(go.Scatter3d( name ="Fisher VT", x =list(zip(*XF_test))[0], z = yF_test, y =list(zip(*XF_test))[1], mode ="markers", line=dict(color="#00CC96"), marker_size=1 ))fig9.update_layout( title=dict( text="Load vs Normalized Outdoor Temperature vs Normalized Flow Temperature for the Testing Dataset" ), scene =dict( xaxis=dict( title=dict( text="Outside Temperature (norm.)" ) ), yaxis=dict( title=dict( text="Flow temperature (norm.)" ) ), zaxis=dict( title=dict( text="Load (W)" ) )), legend=dict( title=dict( text="Series" ) ))fig9.show()
Code
fig10 = go.Figure()fig10.add_trace(go.Scatter( name ="Fisher VT", x = data_8min.index, y = data_8min["1_1"], line=dict(color="#00CC96") ))fig10.add_trace(go.Scatter( name ="Training + Testing Dataset", x = data_toUse.index, y = data_toUse["1_1"], mode ='markers' ))fig10.add_trace(go.Scatter( name ="Predicted Values", x = data_8min.index, y = regrF.predict(scaler.fit_transform(data_8min[["Fisher VT Flow-temperature", "Outside-temperature"]])), line=dict(color="#AB63FA") ))fig10.update_layout( title=dict( text="Fisher VT Load vs Time" ), xaxis=dict( title=dict( text="Timestamp" ) ), yaxis=dict( title=dict( text="Load (W)" ) ), legend=dict( title=dict( text="Series" ) ))fig10.show()
We can see from the above graphs that the model predicts data in the region we would expect, being a pretty accurate line of best fit through the data. The time series shows that it is predicting the correct order of magnitude of values, hovering around the steady state value for the entire data set.
Looking at the performance statistics of the model, we can see that \(R^2 = 0.63\), a MAPE of \(\pm 8%\) and a RMSE of \(\pm 19\) kW. These indicate that approximatly 63% of the variation of the raw data can be explained by the model. Since the data is on the order of ~200 kW, the \(\pm 8%\) and \(\pm 19\) kW are consistant and indicate that the model is accurate to <10%.
RMSE: 11638.47340953614 W
R-squared: 0.7353293896184502
MAPE: 0.07669892433046585
Code
fig11 = go.Figure()fig11.add_trace(go.Scatter3d( name ="Cripps VT Model", x =list(zip(*XC_test))[0], z = yC_pred, y =list(zip(*XC_test))[1], mode ="markers", line=dict(color="#AB63FA"), marker_size=1 ))fig11.add_trace(go.Scatter3d( name ="Cripps VT", x =list(zip(*XC_test))[0], z = yC_test, y =list(zip(*XC_test))[1], mode ="markers", line=dict(color="#00CC96"), marker_size=1 ))fig11.update_layout( title=dict( text="Load vs Normalized Outdoor Temperature vs Normalized Flow Temperature for the Testing Dataset" ), scene =dict( xaxis=dict( title=dict( text="Outside Temperature (norm.)" ) ), yaxis=dict( title=dict( text="Flow temperature (norm.)" ) ), zaxis=dict( title=dict( text="Load (W)" ) )), legend=dict( title=dict( text="Series" ) ))fig11.show()
Code
fig12 = go.Figure()fig12.add_trace(go.Scatter( name ="Cripps VT", x = data_8min.index, y = data_8min["1_2"], line=dict(color="#00CC96") ))fig12.add_trace(go.Scatter( name ="Training + Testing Dataset", x = data_toUse.index, y = data_toUse["1_2"], mode ='markers' ))fig12.add_trace(go.Scatter( name ="Predicted Values", x = data_8min.index, y = regrC.predict(scaler.fit_transform(data_8min[["Cripps VT Flow-temperature", "Outside-temperature"]])), line=dict(color="#AB63FA") ))fig12.update_layout( title=dict( text="Cripps VT Load vs Time" ), xaxis=dict( title=dict( text="Timestamp" ) ), yaxis=dict( title=dict( text="Load (W)" ) ), legend=dict( title=dict( text="Series" ) ))fig12.show()
We can see from the above graphs that the model predicts data in the region we would expect, being a pretty accurate line of best fit through the data. The time series shows that it is predicting the correct order of magnitude of values, hovering around the steady state value for the entire data set. Although it appears from both the teim series ands catter graph to be slighlty on the low end of training dtaset.
Looking at the performance statistics of the model, we can see that \(R^2 = 0.74\), a MAPE of \(\pm 8%\) and a RMSE of \(\pm 12\) kW. This indicates that about 74% of the variation of the raw data can be explained by the model. Since the data is on the order of ~150 kW, the \(\pm 8%\) and \(\pm 11\) kW are consistant and indicate that the model is accurate to <10%.
Conclusions
The CT systems compose a neglgible part of the total load of the Y-Plantroom
The data from the BMS is consistant with the Belimo Data, although an offset is present this is likely due to callibration differences
We can use data from the BMS to create a linear model for load so that predictions can be made using sensors that we already have installed in the system
Both models are accurate to about \(\pm 10%\), this indicates that the models are likely accurate enough to provide an indication of the current load of each system but not precise enough for precise configuration of the systems.
Next Steps
Investigate more stastical means to test the models performance
Create models for the low power and peak power regions
Use the current models to forcast the load on each plantroom into the time periods where we no longer have Belimo Data
Compare maintanence records with the belimo data to find records of previous faults, compare these times to the Belimo and BMS data to identify what a fault looks like in the data provided. And then use this data to produce a model to predict faults going forward.
Use the Heating Curve and Load-Temperature characterisitc to determine when it is the BMS and when it is the weather that is causig the load.